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ABSTRACT 

The Autonomous Power Expert 
(APEX) system is being developed at NASA 
Lewis Research Center to function as a fault 
diagnosis advisor for a space power 
distribution test bed. APEX is a rule-based 
system capable of detecting faults and 
isolating the probable causes. APEX also 
has a justification facility to provide natural 
language explanations about conclusions 
reached during fault isolation. To help 
maintain the health of the power distribution 
system, additional capabilities have been 
added to APEX. These capabilities will 
allow detection and isolation of incipient 
faults and enable the expert system to 
recommend actions/procedures to correct the 
suspected fault conditions. New capabilities 
for incipient fault detection consist of storage 
and analysis of historical data and new user 
interface displays. After the cause of a fault 
has been determined, appropriate 
recommended actions are selected by rule- 
based inferencing, which provides 
corrective/extended test procedures. Color 
graphics displays and improved mouse- 
selectable menus have also been added to 
provide a friendlier user interface. 

This paper contains a discussion of 
APEX in general and a more detailed 
description of the incipient detection, 
recommended actions, and user interface 
developments during the last year. 


Jerry L. Walters 
National Aeronautics 
and Space Administration 
21000 Brookpark Road 
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INTRODUCTION 

Our future presence in space will 
require larger and more sophisticated 
working and living environments. Such 
environments will consist of numerous 
integrated subsystems that will have to be 
maintained with a high degree of reliability. 
Primary among the various subsystems is the 
power distribution system that supplies 
electrical energy throughout the space-based 
facility. The availability of space power will 
be finite, and the sharing of limited power 
resources will have to be optimally 
scheduled. If a fault occurs within the power 
distribution system, disruption of scheduled 
power usage will result in a costly loss of 
mission time and could threaten the operation 
of other subsystems such as life support. 

Figure 1 shows a typical power 
distribution test bed designed for space-based 
applications. Electrical energy is collected 
by solar arrays, converted to 20 kHz power, 
and transmitted through power lines to the 
various loads. Power distribution paths are 
opened/closed by using switching devices 
known as Remote Bus Isolators (RBI’s). 

Each RBI contains a number of sensors to 
measure the various operating parameters of 
the power distribution system such as 
current, voltage, power, and power factor. 
Upper level controllers access the sensory 
data and relay the information to a central 
Power Management Controller (PMC). 

When an RBI is tripped because of an 
overcurrent condition attributed to a fault in 
the system, the PMC will attempt to restore 
the lost power by activating alternate RBI’s 
that will reconfigure the power distribution 
system. 
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Quick and automatic reconfiguration of 
the power distribution system by the PMC 
provides the necessary capability to maintain 
power distribution when a fault occurs. To 
preserve the health of the power distribution 
system, however, the fault must be isolated 
and appropriate recovery procedures must be 
performed to repair the problem. Potential 
power disruptions can also be avoided by 
detecting incipient fault conditions that are, 
at present, nonthreatening to the power 
distribution system but that over a period of 
time will become a fault. Isolation of and 
recovery from a fault condition depend on 
the technical knowledge and experience of 
power systems personnel. Incipient faults 
are detected by continuously monitoring 
sensory data for indications of persistent 
upward or downward trends in any of the 
power distribution system measurements. 

In a real space environment, with a 
limited crew size, space power expertise may 
be unavailable, and with a large number of 
switching devices, routine maintenance 
checks and power system data analyses for 
incipient fault conditions would require a 
significant amount of crew time. Therefore 
autonomous control of space power 
distribution by expert systems with fault 
isolation, fault recovery and incipient 
detection will greatly enhance the reliability 
of the power distribution system and reduce 
the human workload. 

The Autonomous Power Expert 
(APEX) is a software system designed to 
emulate a human expert’s reasoning 
processes in order to solve problems in space 
power distribution. The APEX system 
automatically monitors the operating status of 
the power distribution system and reports any 
anomaly as a fault condition. APEX then 
functions as a diagnostic advisor, aiding the 
user in isolating the cause of the detected 
fault condition and in repairing the power 
distribution system. 

Development work for the current 
design of APEX was based on the Power 
Distribution Unit A (PDUA) subsystem 
shown in figure 1 [Troung 1989]. APEX is 
currently interfaced to the PMC controller, 
which communicates with the Power 
Distribution Controller (PDC). APEX sends 
a request for data to the PMC. The PMC 
acquires the requested data from sensors on 
the power distribution switching devices via 


the PDC and passes the data to APEX. 

When APEX has collected the power 
distribution parameter data, a fault detection 
phase is initiated. 

APEX detects faults by comparing 
expected values to the measured operating 
values (parametric values) obtained from the 
controller. The expected values are 
calculated by APEX from the scheduled 
profile data of the loads connected to the 
PDUA. If no deviations from the expected 
operating state of the PDUA are found, 
APEX will again request data from the PMC 
and re-initiate the fault detection activity 
with the new data. If an anomaly is found 
within the data acquired from the PMC, 
APEX will inform the user that a fault has 
been detected. 

The user can direct APEX to isolate 
the probable cause of the fault. APEX 
accesses information and rules contained in 
its knowledge base, reaches a conclusion, 
and displays to the user the probable cause 
for the detected fault. The user can then ask 
APEX to justify its conclusion and to 
recommend actions to correct the fault. 


IMPLEMENTATION OVERVIEW 

APEX is currently implemented on a 
Texas Instruments Explorer II workstation in 
LISP and employs the Knowledge 
Engineering Environment (KEE) expert 
system shell. APEX consists of an 
integrated set of software, including a 
knowledge base, a database, an inference 
engine, and various support and interface 
software. The knowledge base comprises 
facts and rules that correspond to knowledge 
acquired from the human expert during 
problem solving. The database is the basic 
working area where storage and calculations 
of sensory data for incipient fault detection 
occurs. The inference engine is the 
reasoning mechanism that, during fault 
isolation, draws conclusions from 
information stored within the knowledge 
base. In choosing the appropriate recovery 
procedures for the isolated fault, APEX also 
relies on the reasoning capabilities of the 
inference engine. Conventional software 
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provides the user with an interactive interface 
to communicate with APEX and to obtain 
data from various sources such as power 
distribution hardware and planner/scheduler 
software. 

Knowledge is represented within the 
APEX knowledge base mainly by frames, 
semantic triples, and production rules. 

Frames are structures that describe objects or 
classes of objects and their relationships. 
Objects are composed of slots that specify 
the various attributes belonging to each 
object. Individual slots of an object can 
contain declarative information or attached 
procedural functions. Declarative 
information expresses facts about the object, 
whereas procedural functions are programs or 
a set of procedural steps attached to the slot 
producing a particular behavior for the 
object. Within APEX, declarative 
information is represented by semantic triples 
that state information in the form of 
object/attribute/value (ie. attribute of object = 
value). Production rules are "If-Then" 
statements that imply either declarative facts 
or procedural behaviors when the conditional 
statements contained in the premises of the 
rule are found to be true. [Sell 1985] 

The database contains a historical 
record of data acquired from the switching 
devices in the power distribution system. 
Storage and manipulation of these data are 
accomplished with conventional techniques 
and do not require the use of the inference 
engine. A detailed description of the 
structure and use of the database is given in 
the section on incipient fault detection. 

APEX employs an inference engine 
contained in the Knowledge Engineering 
Environment (KEE) expert system shell 
[KEE 1989]. The inference engine is the 
heart of the expert system; it determines how 
knowledge is represented and processed. By 
operating on the rules within the knowledge 
base, the inference engine can reason and 
draw inferences about the state of the power 
distribution system. The inference engine 
rule processing strategies are commonly 
referred to as forward and backward 
chaining. Forward chaining works from the 
given data to a conclusion by examining the 
premises of the rules to determine if the 
conclusion of a rule can be inferred. If a 
conclusion is inferred, the new facts asserted 
by the conclusion could then cause other 


premises in other rules to imply even more 
conclusions. Backward chaining works from 
a particular goal and tries to either confirm 
or refute its truth. In the case of backward 
chaining, rules are selected by first matching 
the conclusion of the rules with the stated 
goal. If the true/false values of the premises 
of the matched rule are unknown, the 
premises become subgoals, which then can 
cause other rules to be selected. The goal is 
asserted only when all of the premises and 
subgoals of the goal-matched rule are known 
to be true. In the APEX system, fault 
detection is driven by sensory data and is 
implemented with forward chaining. Fault 
isolation is accomplished with backward 
chaining by giving APEX the goal of finding 
the probable cause of the fault. 

APEX also consists of various support 
software that allows communication with the 
outside world. The user interface enables 
APEX to communicate with the operator 
through color graphics display screens and 
menu selections. Using the menu options, 
the user can select the detail level of 
information to be displayed, ask for 
justification of a particular conclusion, and 
request recommended action to correct an 
isolated fault. Other communication links 
provide data acquisition from the power 
distribution system via the lower level 
controllers, and load profile data acquisition 
from a remote scheduling system [Ringer 
1990]. 


Incipient Fault Detection 

Faults are detected by comparing the 
parametric values (measured operating 
values) of the power distribution system to 
the expected values and identifying any 
abnormal operating parameters. When the 
detection rules have been exhausted, APEX 
reports to the user whether or not any faults 
were detected. If a fault was detected, the 
user can then ask the expert system to isolate 
the probable cause of the fault. If no 
abnormal conditions were detected, the 
historical data is analyzed for incipient fault 
conditions. 

Incipient detection is based on 
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statistical linear regression and correlation 
analysis of the historical data. As new data 
are received, the parametric values of the 
power distribution system are stored as 
historical data under the appropriate 
attributes for each switching device. Along 
with each measured value, the expected 
value that is calculated by the expert system 
is also saved. The expert system analyzes 
the historical data looking for any indication 
of a parametric attribute that has maintained 
either an upward or downward trend in the 
data values over a period of time. The 
following parametric attributes are stored for 
each device: switch A current, switch B 
current, line voltage, load voltage, and 
power. 

Since the power system is dynamic 
and the measured value fluctuates over a 
period of time during normal operation, a 
parametric ratio of the measured-to-expected 
value is used to identify any increasing or a 
decreasing trends in the parametric data. 
Thus, if the measured and the expected 
values are equal, the ratio will be one. If the 
measured value is higher than the expected 
value, the ratio will be greater than one; if 
the measured value is less than the expected 
value the ratio will be less than one. 

Once the data have been stored in the 
database, correlation coefficients are 
calculated for each parametric attribute of 
each switching device. The correlation 
coefficients are calculated in the following 
manner [Trivedi 1982]: 

The mean value of a variable is found 

from 

i N 

s = if*. 

the time variance from 

^x 2 = ? - (X ) 2 


the parametric variance from 



and the covariance of X and Y from 
XY - XY 

where X is the time values and Y is 
the parametric values. 


The correlation coefficient r, then, is 
XY - XY 

r - ~~wT 


where the standard error is 


- C \J 1 “ r ; 


y ~y 

the slope is 

m = XY - XY 

o - 2 

and the Y-intercept is 

b = Y - mX 


A high correlation coefficient, caused 
by a parametric ratio trend, indicates that a 
temporal relationship exists. The value of 
the correlation coefficient lies between zero 
and one. A zero indicates that there is no 
correlation between the time and historical 
parametric data; however the closer 
correlation coefficient is to one, the stronger 
the time and parametric value correlation. 
APEX currently will consider an incipient 
fault condition to exist if the correlation 
coefficient of a parametric attribute is higher 
than .75. 

Once an incipient fault condition has 
been detected, the user can view the results 
of the statistical analysis and also have 
APEX isolate the probable cause of the 
incipient condition. Figure 2 shows a typical 
display indicating a definite increasing trend 
in the ratio between measured values and 
expected values. The trend was detected 
within the switch A current parameter of 
switching device RBI.3/3. Along with the 
plot of the linear regression results, the 
correlation coefficient, slope, standard error, 
and y-intercept are displayed for the user. A 
set of isolation rules for detected incipient 
fault conditions can access the database and 
examine correlation coefficients of the 
various parametric attributes of each 
switching device. 
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USER INTERFACE 

The goal of the user interface is to 
provide access to APEX which is intuitive, 
and requires only a small amount of training. 
Communication between APEX and the user 
is accomplished with easy to use mouse- 
selectable menus, and color graphics and text 
displays. The user interface screen presents 
a color display that is divided into three 
areas as shown in figure 3. The top portion 
of the screen is the control menu that allows 
the user to select the desired APEX function. 
When a function is selected, mouse- 
selectable options for that function appear in 
the options menu located in the lower 
portion of the screen. As APEX performs 
the selected function, the control menu is 
replaced by a status display window 
indicating the operational steps being 
executed. Fault detection and fault isolation 
results are shown within the main display 
area by means of color diagrams and text 
explanations. 

The control menu contains the 
following six mouse-selectable functions: 
MONITOR, DETECTION, ISOLATE 
CAUSE, RESET SYSTEM, LOG FILE, and 
EXIT. The MONITOR selection causes 
APEX to continuously acquire and check 
parametric values from the power distribution 
system. When either an active or incipient 
fault is detected, APEX stops monitoring and 
displays a "fault detected" message in the 
upper left comer of the user interface screen. 
Once alerted, the user can display the fault 
detection analysis performed by the 
MONITOR function by selecting 
DETECTION in the control menu. When 
ISOLATE CAUSE is selected from the 
menu, APEX will access the fault isolation 
rules to determine the probable cause of the 
detected fault. The RESET SYSTEM 
function clears the working space of the 
APEX system to prepare APEX for 
monitoring the power distribution system. If 
the user wants to record the session with 
APEX, a file can be opened/closed and 
printed with the LOG FILE function. The 
EXIT function allows the user to either 
terminate APEX, switch over to the power 
system data simulator, or to communicate 
with a remote planner/scheduler. 

Recall that when a function is selected, 
the options menu provides the user with 


available options for that function. For 
example, when the user selects the ISOLATE 
CAUSE function, APEX will display the 
probable cause of a detected fault and the 
options menu will contain CONTINUE, 
WHY?, RECOMMEND. The CONTINUE 
option will allow the user to exit from the 
ISOLATE CAUSE function and continue 
APEX operations with the control menu. If 
the user selects WHY?, APEX will display 
the reasoning process leading to the probable 
cause conclusion. The RECOMMEND 
option allows the user to request 
recommended action procedures for 
correcting the fault; this option also has a 
user confirmation/rejection sub-option during 
any procedural step requiring autonomous 
action of the APEX system, such as 
reconfiguring the power distribution system. 

The graphical displays in the main 
display area consist of a set of hierarchical 
diagrams that represent three different levels 
of information. The diagram in the main 
display area shown in figure 3 represents the 
overall power distribution system. When an 
active fault is detected, in the diagram the 
area of detection is outlined in red and a red 
flashing cursor appears next to the area. For 
an incipient fault condition, the area is 
outlined in yellow and has a yellow flashing 
cursor. The yellow indicates that a 
parametric value is probably going to go out 
of tolerance if preventive action is not taken. 
The user can get a more detailed diagram of 
an area by choosing the particular area of 
interest and clicking the mouse. Figure 4 
shows the user interface screen after the user 
selects on PDUA of the top level diagram. 

In this PDUA subsystem diagram, the user 
can easily see the location of the detected 
parametric abnormality at the switching 
device level. Figure 5 shows the switch 
level diagram after the user clicks the mouse 
on one of the switching devices, such as RBI 
3/3. Each switch level diagram displays the 
actual measured data values enabling the user 
to see which parametric attribute is out of 
tolerance. 


RECOMMENDED ACTIONS 

After APEX has isolated the probable 
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cause of a detected fault or an incipient fault 
condition, the user can to ask for fault 
recovery recommendations. APEX will 
analyze available information about the 
current operating conditions with respect to 
the fault and display appropriate actions to 
be taken. Recommended actions pertain to 
both short- and long-term recovery. Short- 
term recovery determines if the fault can be 
tolerated for a period of time, if the power 
distribution can be reconfigured, or if load 
shedding is necessary. For long term 
recovery, the repair procedures needed to 
correct the fault are determined after short 
term actions have been implemented. 

Short-term recovery analysis is based 
on a set of "recommended action" rules for 
the particular fault condition. Information 
about available power sources, current 
configuration of the power distribution 
system, the scheduled run times of the loads, 
and the effects of the fault on the system are 
all considered during the analysis. If enough 
power is available and the effects of the fault 
are minimal with respect to remaining 
scheduled run time of the affected loads, 
then the fault can be tolerated and the loads 
are allowed to run to completion. If the fault 
is seriously affecting the amount of power 
reaching a particular load and an alternate 
path for power distribution exists, then the 
system can be reconfigured automatically, or 
with user confirmation, to allow the load to 
run to completion. When the fault cannot be 
tolerated and alternate power distribution 
paths are unavailable, then the schedule for 
the loads is replanned by a remote 
scheduling agent; this results in load 
shedding and a new schedule. 

After short-term recovery, the fault in 
the power distribution system needs to be 
repaired. The appropriate procedures needed 
to repair the power distribution system are 
determined by long term recovery, which is 
also based on a set of recommended action 
rules. In some cases, the cause of the fault 
is localized to a group of possibilities, and 
additional troubleshooting procedures are 
displayed to intelligently guide the user to 
further isolate the exact location and to make 
repairs. 


CONCLUDING REMARKS 

The APEX system consists of an 
integrated set of software agents, including a 
knowledge base, database, inference engine, 
data acquisition interface to the power 
distribution hardware, and a communication 
interface to a remote planner/scheduler. 

During the past year, advanced development 
of the APEX system has included addition of 
incipient fault analysis, an improved 
multilevel color user interface and a new 
recommended action facility. 

Incipient fault analysis adds a unique 
health monitoring capability to prevent faults 
by continuously monitoring all parametric 
values in the power distribution system. 

APEX can warn the user of potentially 
threatening fault conditions before power 
distribution interruptions are experienced. 

This continuous health monitoring will of 
relieve human operators of labor-intensive 
mission control operations. Moreover, the 
type of continuous monitoring that APEX 
provides eliminates problems that can occur 
with human monitoring such as errors caused 
by fatigue. 

The color capability of the new user 
interface enhances the information display 
and provides a friendlier man machine 
interface. Location and type of detected 
faults are immediately recognized when 
flashing combined with color coding appears 
on multilevel displays. In addition, the user 
interface contains mouse-selectable menus 
that present appropriate options for accessing 
information and obtaining fault 
recovery/prevention assistance. 

The new recommended actions feature 
determines the most appropriate procedures 
for recovering from and preventing power 
distribution faults. The procedures are 
determined by rules stored in the knowledge 
base and the reasoning capability of APEX. 
Recommended actions consist of both short- 
and long-term recovery procedures necessary 
for maintaining the health of the power 
system. Execution of short-term recovery 
procedures restores power to scheduled 
loads, and execution of long-term actions 
effectively repairs isolated areas of the power 
distribution circuit. 

In future space applications, APEX can 
be applied to help maintain the operational 
health of the power distribution systems. 
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APEX will be able to diagnose fault 
conditions and recommend appropriate 
recovery procedures when experienced power 
system personnel are unavailable. By 
allowing APEX to autonomously monitor 
and analyze power distribution system data, 
faults can be detected before serious 
problems develop and costly power 
interruptions occur. Increased reliability of 
space power distribution and a substantial 
reduction in the human labor required for 
routine monitoring of system operations is 
the goal of the APEX project. 



Figure 3. User Interface 

(with power system diagram) 



Figure 1. Power Distribution Test Bed 



Figure 4. User Interface 

(with PDUA diagram) 
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Figure 2. Incipient Fault Condition Analysis 



Figure 5. User Interface 

(with switch diagram) 
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